BMC Research Notes — Latest Matching Preprints

1

Vilsmeier, J.; Saadati, M.; Miah, K.; Benner, A.; Doehner, H.; Beyersmann, J.

2026-03-26 oncology 10.64898/2026.03.25.26349169 medRxiv

Top 0.1%

3.7%

Show abstract

BackgroundIn acute myeloid leukemia studies, event-free survival (EFS) is defined as time until treatment failure, relapse, or death, whichever occurs first. Since 2020 and 2022, respectively, the US Food and Drug Administration and the European LeukemiaNet recommend analysing treatment failures as day-1 events. This data modification can lead to a potentially large drop in the estimated EFS at day 1. If censoring occurs, the Kaplan-Meier estimator obtained from the recoded data underestimates this drop. Our aim is to obtain an unbiased estimate for EFS as basis for further inference. MethodsWe define "event on day 1" as one event type and " event after day 1" as a competing event in the original data and use the Aalen-Johansen estimator of the cumulative incidence curve to estimate event-specific transition probabilities, which are combined in one EFS estimate. To analyse effects on day 1 treatment failure and other post-day-1 EFS events separately, a formal link to cure models is established by equating treatment failures with the "cured" proportion in cure model terminology. Additionally, a variance estimator, confidence intervals, confidence bands, and simultaneous testing procedures are derived. ResultsOur new estimation method differs from the Kaplan-Meier estimator in settings in which some treatment failures are censored, as in the interim analysis of the AMLSG 09-09 study. If almost no treatment failures are censored, the two estimation methods do not differ. The cure model and simultaneous testing are able to estimate effects on day 1 treatment failure and other post-day-1 EFS events separately and function independently of whether data is modified. ConclusionsThe Kaplan-Meier estimator evaluated on the recoded data underestimates the drop at day 1 if treatment failures are censored. With sufficient follow-up, this bias disappears, and results coincide with our novel approach.

2

Impact of AI-Assisted Mammography Reading on Quality Indicators in the Czech Breast Cancer Screening Programme: A Retrospective Study

Veverkova, L.; Dolezalova, Z.; Marackova, V.; Mathew, E.; Urbankova, M.; Ambrozova, M.; Piskovsky, T.; Ngo, O.; Majek, O.

2026-05-26 oncology 10.64898/2026.05.25.26353869 medRxiv

Top 0.1%

2.2%

Show abstract

Objectives: The aim of mammographic screening is the early detection of invasive cancers. In the era of artificial intelligence (AI), this tool may improve diagnosis of earlier stages. The purpose of this study was to assess the impact on selected quality indicators retrospectively. Method: The data source was the Breast Cancer Screening Registry using data from one Screening Unit that currently uses AI routinely. The indicators of the cancer detection rate (CDR), further assessment rate (FAR), and recall rate (RR) in the year 2023, when AI was used, and the year 2022, without AI, in women aged 45-69 were compared. The statistical evaluation used the chi-square test and logistic regression adjusting for the effects of age, a woman's risk level, and the screening round at a 5% significance level. Results: In 2022, without AI, 4,034 women aged 45-69 were included, compared with 4,049 women in 2023 when AI was used. This study showed a non-significant increase in CDR from 5.0 breast cancers detected per 1,000 women (non-AI assessment) to 5.2 (AI-assisted assessment), p = 0.919; OR (95% CI): 1.034 (0.542-1.974), a significant decrease in the FAR from 5.2% to 3.9%, p < 0.001; OR (95% CI): 0.665 (0.529-0.836), and a decrease in RR from 2.4% to 1.9%, p = 0.083; OR (95% CI): 0.754 (0.548-1.037). Conclusion: AI has the potential to be a useful tool in the early detection of breast cancer by improving quality through a decrease in FAR and RR, while probably maintaining CDR.

3

A supervised digital game intervention supports language and communication in young children.

Pena, M.; Dehaene-Lambertz, G.; Pino, E.; Pittaluga, E.; Cortes, P.; de la Riva, C.; Palacios, O.; Guevara, P.

2026-04-04 developmental biology 10.64898/2026.04.02.716239 medRxiv

Top 0.1%

2.1%

Show abstract

The role of digital media in early childhood development remains highly debated, particularly regarding its impact on language acquisition. While excessive or unsupervised screen exposure has been linked to poorer outcomes, less is known about whether structured and interactive uses of technology can support learning. Building on previous research, we evaluated a brief, educator-supervised tablet-based intervention in 246 children aged 2-5 years from low- to middle-socioeconomic backgrounds attending public early education centers. Using a pre-post design with matched study and control groups, children completed 4-8 short training sessions (15 minutes each) involving interactive word-image associations spanning multiple linguistic categories. Preschoolers additionally engaged in prompted vocalization. Across age groups (2-3, 3-4, and 4-5 years), children in the intervention showed greater gains in language comprehension than controls, including receptive language in toddlers ({beta} = 0.49, p = 0.009), vocabulary and morphology in younger preschoolers ({beta} = 0.59-0.68, all p < 0.05), and grammar comprehension in older preschoolers ({beta} = 0.30, p = 0.038). These effects were consistent after accounting for child and parental characteristics. Together, these findings suggest that the developmental impact of digital media depends less on exposure itself than on how it is used. When embedded in structured, socially guided interactions, even brief tablet-based activities may support early language development

4

Assessing the Impact of Interventions on Tuberculosis Control: India Based Modelling Framework

Raj, Y. A.; Parthasarathy, R.; Mitra, M. K.; Mehra, S.

2026-05-22 epidemiology 10.64898/2026.05.20.26353466 medRxiv

Top 0.1%

2.1%

Show abstract

Background India accounts for nearly one-fourth of the global tuberculosis (TB) burden. The country's progress towards elimination of TB is hindered by considerable heterogeneity in behavioural, social, and health system determinants, which influence transmission dynamics and care access. Evidence from the recent national TB prevalence survey showed that almost half of individuals with active disease were asymptomatic, underscoring the limitations of symptom -based case finding. Achieving the End TB targets will therefore require strategies that simultaneously address the substantial pool of individuals with undiagnosed, asymptomatic disease and those symptomatic individuals who do not seek care. Methods We developed a transmission model of TB that explicitly incorporates individuals with asymptomatic disease, and those who do not seek care. Model calibration was performed within a Bayesian framework using epidemiological and programmatic data for India. The calibrated model was then used to project the potential impact of intervention on TB incidence and mortality. Results Under the baseline scenario, the estimated TB incidence and mortality rates for 2024 were 180 (163-203) and 24 (18-31) per 100,000 population, respectively. Across all intervention scenarios targeting improved diagnosis, active case finding, nutrition support and their combination the reduction in incidence rate by 2030 ranged from 13% to 60% compared with 2025, while the corresponding decline in mortality rate ranged from 16% to 66%. Conclusion While individual interventions yield measurable reductions in TB incidence and mortality, but greater impact is achieved when implemented in combination reflecting the need for a comprehensive, multi-component response towards TB elimination.

5

Effectiveness of RMSSD-Based Adaptive Music Therapy (Skitii) in Reducing Treatment-Related Anxiety in Head and Neck Cancer Patients: Protocol for a Randomized Controlled Trial

Adhikari, P.; M, D.; Subramanium, V.; Krishna, T.; B, A.; Jain, C. B.

2026-05-15 oncology 10.64898/2026.05.13.26353099 medRxiv

Top 0.1%

1.7%

Show abstract

Background: Head and neck cancer (HNC) patients experience clinically significant anxiety and depression in 65-85% of cases during active treatment. Current supportive care lacks personalized, real-time non-pharmacological interventions. Skitii is a novel HRV-adaptive music therapy system that uses continuous RMSSD (root mean square of successive differences) monitoring via a Polar H10 chest sensor to select music in real-time, targeting parasympathetic recovery (RMSSD >=30ms). Methods: This is a prospective, open-label, randomized controlled trial (1:1 allocation) at Yenepoya Medical College Hospital, Mangalore, India. Adults aged 18-75 years with confirmed head and neck cancer (any subsite, Stage I-IV) undergoing radiotherapy and/or chemotherapy with baseline distress (HADS >=8 or NCCN Distress Thermometer >=4) will be enrolled. Participants are randomized to Skitii adaptive music therapy (20-minute sessions, 3 times daily, 3 weeks) or static music therapy control. Skitii uses a two-phase algorithm: Phase 1 (0-2.5 minutes) uses heart rate as a stress proxy for immediate music selection; Phase 2 (2.5-20 minutes) uses RMSSD to adapt music every 2.5 minutes when physiological state changes by >=20%. Primary endpoints are HADS-Anxiety score and resting RMSSD at Week 3. Sample size is 70 (35 per arm), powered at 80% to detect a 2.5-point HADS difference (SD=3.8, alpha=0.05, 15% dropout). Analysis is ANCOVA, intent-to-treat. Discussion: This is the first randomized controlled trial evaluating RMSSD-based adaptive music therapy in cancer patients. The active control design isolates the effect of the adaptive algorithm from music exposure alone. If positive, results will support a scalable, cost-effective supportive care intervention with objective physiological monitoring, and provide the clinical evidence base for CDSCO Class B medical device approval for Skitii in India, with future CE Mark and FDA applications planned. Trial Registration: Clinical Trials Registry - India CTRI CTRI/2025/11/116732

6

Metastatic Patterns and Treatment Characteristics of Triple-Negative Breast Cancer in Nigeria: A Retrospective Cohort Study

Sowunmi, A.; Agbakwuru, C.; Aje, E.; Kehinde, O.; Andero, T.; Eze, C. G.; Oshikanlu, B.

2026-06-12 oncology 10.64898/2026.06.10.26355358 medRxiv

Top 0.1%

1.7%

Show abstract

Background: Triple-negative breast cancer (TNBC) is an aggressive breast cancer subtype characterized by the absence of estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2 expression. It is associated with limited targeted treatment options, early relapse, and a high propensity for visceral metastasis. Data describing metastatic patterns and treatment characteristics of TNBC in Nigeria remain limited. Methods: This retrospective descriptive cohort study included 869 patients with TNBC managed at the Medserve-LUTH Cancer Center, Lagos University Teaching Hospital, Nigeria between June 2019 and June 2024. Demographic, clinicopathologic, metastatic, and treatment-related data were extracted from electronic medical records. Descriptive statistics were used to summarize patient characteristics, metastatic patterns, and treatment profiles. Associations between metastatic disease and selected clinicopathologic and treatment variables were explored using Pearsons chi-square test. Complete-case analysis was applied throughout. Results: The mean age at presentation was 52.09 {+/-} 12.26 years. Most patients were married (79.1%), postmenopausal (64.3%), and of Yoruba ethnicity (56.8%). Advanced disease predominated, with Stage III and Stage IV disease accounting for 42.9% and 35.6% of cases, respectively. Invasive ductal carcinoma was the most common histologic subtype (77.0%), while Grade II tumours constituted 51.3% of graded cases. Surgery was performed in 73.1% of patients, predominantly mastectomy (70.9% of surgical procedures). Chemotherapy was administered to 83.2% of patients, most commonly anthracycline-based regimens (41.8%), while radiotherapy was delivered to 63.5% of patients, with hypofractionated schedules of 42-43 Gy in 15-16 fractions accounting for 47.2% of radiotherapy courses. Metastatic disease was documented in 32.9% of evaluable patients. Lung metastasis was the most frequent site (62.5%), followed by bone (46.3%), regional lymph node invasion (38.5%), liver (23.0%), and brain (22.6%). Tumour grade and histologic subtype were not significantly associated with metastatic disease, whereas radiotherapy exposure demonstrated a significant association with metastatic status ({chi}{superscript 2} = 10.35, p = 0.001). Conclusion: TNBC in this Nigerian cohort was characterized by advanced-stage presentation, invasive ductal predominance, extensive use of multimodality treatment, and substantial visceral metastatic burden. Lung metastasis was the most common metastatic site. These findings provide contemporary real-world data on TNBC in Nigeria and highlight the continuing need for earlier diagnosis, timely referral, and sustained investment in comprehensive cancer care services.

7

Development and Pilot Validation of ABHA-O-SHINE: An AI-Ready Oral Health Risk and Insurance Prediction Framework within the Ayushman Bharat Digital Ecosystem

Saxena, Y.; SHRIVASTAVA, L.

2026-04-01 public and global health 10.64898/2026.03.31.26349846 medRxiv

Top 0.1%

1.5%

Show abstract

Background: Oral health remains inadequately integrated within the Ayushman Bharat Digital Mission (ABDM), particularly in terms of structured risk assessment and its linkage to insurance-based decision-making. There is a growing need for scalable models that can connect clinical oral health data with digital health systems and support future artificial intelligence (AI)-driven applications. Aim: To develop and pilot test the ABHA-O-SHINE framework for oral health risk prediction and insurance prioritization, with a future scope for AI integration within the Ayushman Bharat Health Account (ABHA) ecosystem. Materials and Methods: A cross-sectional pilot study was conducted among 126 participants attending the outpatient department of Swargiya Dadasaheb Kalmegh Smruti Dental College and Hospital, Nagpur. Participants were selected based on predefined inclusion and exclusion criteria. Data collection included a structured questionnaire and clinical examination using the WHO Oral Health Assessment Form (2013). A composite risk score (0 to 14) was developed incorporating behavioral and clinical parameters. Participants were categorized into low, moderate, and high-risk groups, and corresponding insurance priority levels were assigned. Statistical analysis included descriptive statistics, Chi-square test, Spearman correlation, and binary logistic regression. Results: The majority of participants were categorized under moderate to high-risk groups. Tobacco use showed a statistically significant association with higher risk levels (p less than 0.05). Positive correlations were observed between total risk score and clinical indicators such as DMFT and CPI. Logistic regression analysis identified tobacco use and clinical scores as significant predictors of high-risk categorization. Conclusion: The ABHA-O-SHINE framework demonstrates feasibility in integrating oral health risk assessment with an insurance prioritization model. The framework is designed to be AI-compatible, enabling future automation through machine learning and image-based analysis within the ABDM ecosystem. Keywords: ABHA, ABDM, Oral Health, Risk Assessment, Insurance, Artificial Intelligence.

8

ChooseMyStat: A Web-Based Interactive Tool for Statistical Test Selection and Analysis Plan Generation in Clinical Research

Srivastava, S.; Punyani, S. R.; Vazalwar, D.; Joshi, A.; Pakhare, A. P.

2026-06-03 medical education 10.64898/2026.06.02.26354730 medRxiv

Top 0.1%

1.5%

Show abstract

Background: Postgraduate medical residents frequently face difficulty in selecting appropriate statistical tests and preparing statistical analysis plans (SAPs) for thesis work. Existing resources often identify statistical tests without guiding implementation, reporting or software execution. Aims: To describe the development, features and content validation of ChooseMyStat, a free, open source, web based interactive tool for statistical test selection and SAP text generation in clinical research. Methods: ChooseMyStat was developed as a React based web application using an iterative, AI assisted development process under direct faculty supervision. The tool uses a branching decision algorithm covering 18 inferential statistical tests, two diagnostic accuracy measures, four agreement/reliability statistics, and four descriptive statistics scenarios. For each recommendation, it generates a SAP template paragraph, a results reporting example, step by step JASP instructions, and R code. Content validation was performed using 105 open-access original research articles from 15 broad medical specialties published in Indian journals during 2024 2025. Results: The tool covers commonly used statistical methods, including t tests, ANOVA, chi square variants, non parametric alternatives, correlation, regression (linear, logistic, ordinal), survival analysis, methods for clustered or repeated data, diagnostic accuracy measures, and agreement/reliability statistics. Among 365 statistical tests identified across 105 articles (excluding normality checking procedures), 346 (94.8%) were covered by the tool. Complete coverage of all statistical methods used was observed in 86 of 105 articles (81.9%). Conclusions: ChooseMyStat integrates statistical test selection with implementation guidance, SAP generation, reporting support and software instructions within a single interface. The tool may support postgraduate research training by improving accessibility to applied biostatistics guidance.

9

A Web Application for Exploring Distribution in Academic Publications Across Geography and Institutions in India

Hou, Y.; Cohen, E.; Higginbottom, J.; Rountree, L.; Ren, Y.; Wahl, B.; Nyhan, K.; Mukherjee, B.

2026-03-20 health informatics 10.64898/2026.03.18.26348755 medRxiv

Top 0.2%

1.5%

Show abstract

India's national research capacity and infrastructure are unevenly distributed across states and union territories (UTs), contributing to geographic variation in academic publication output. We developed Indiapub, an open-access web application that quantitatively enumerates and visually displays geographic and temporal publication patterns for research products with at least one author affiliated with an Indian institution, using OpenAlex data. The app is designed for ease of use, with automated data retrieval, cleaning, and aggregation. Indiapub allows users to filter publications by topic, publication year range, author position, publication type, minimum citation count, state/UT, and population size of the state/UT where the author institution is located. The app also provides downloadable tables and ranked institution lists by publication count. Its interactive dashboard includes five modules: (i) a map of publication distribution, (ii) time trend plots for nation and state/UT, (iii) publication-share versus population-share plots highlighting over- and underrepresentation, (iv) stacked bar charts of state/UT contributions over time with population benchmarks, and (v) bubble plots relating the Human Development Index to publication volume over time. This tool may support resource prioritization and identification of institutional strengths for trainees, researchers, higher education administrators, and policymakers. To illustrate its utility, we present sample findings derived from the app. For publications across all topics from 2014 to 2025, the largest research participation footprints were observed in Tamil Nadu, Maharashtra, Delhi, Uttar Pradesh, and Karnataka. Tamil Nadu and Delhi were home to three of the highest-publishing institutions nationally: Vellore Institute of Technology, All India Institute of Medical Sciences, and Indian Institute of Technology Delhi. We also examined six curated case studies of broad scientific interest: electronic health records (EHR), genome-wide association studies (GWAS), artificial intelligence (AI), development economics, environmental science, and COVID-19. Findings from these case studies revealed over- and underrepresentation in publication output across states and UTs. For example, in EHR publications among high-population states, Tamil Nadu's publication share exceeded its population share by 31.3 percentage points (pp), whereas Bihar's was 12.8 pp lower. Our tool offers insights into India's research landscape across states and UTs with easy-to-digest visuals. Such interactive tools have the potential to serve as a starting point for fostering a more inclusive research ecosystem supporting targeted research policy and planning.

10

Retrospective cohort study extracting coexisting background breast-lesion features from stage I-III invasive breast cancer

Lim, R. J. Y.; Nitar, P.; Lau, K. W.; Leong, L. C. H.; Lim, G. H.; Tan, V. K. M.; Tan, B. K. T.; Tan, E. Y.; Goh, S. S. N.; Hartman, M.; Wong, F. Y.; Li, J.; Joint Breast Cancer Registry,

2026-05-22 oncology 10.64898/2026.05.19.26353633 medRxiv

Top 0.2%

1.4%

Show abstract

Background Background breast features are frequently noted in pathology reports alongside invasive breast cancer but rarely factor into prognosis or treatment decisions. Their relationship to tumor characteristics and patient outcomes remains incompletely characterised. Methods We conducted a retrospective cohort study of 7,603 patients with Stage I-III invasive breast cancer (diagnosed 1991-2022, age <80 years) from the Joint Breast Cancer Registry in Singapore. Natural language processing (NLP) was applied to 9,754 free-text pathology reports to extract co-existing background breast features, with accuracy validated by dual-reviewer assessment of 200 reports. Unsupervised hierarchical clustering grouped extracted features into three categories. Associations with tumor characteristics were assessed by multinomial logistic regression, and ten-year overall survival by Cox proportional hazards models (median follow-up 9.6 years; 620 deaths). Results Here we show that NLP-based extraction of background breast features from routine pathology reports achieves an accuracy of over 90% across features. Lobular neoplasia and benign proliferative changes are associated with less aggressive tumor characteristics, whereas early neoplastic and papillary lesions are more prevalent in HER2-enriched and luminal B tumor subtypes. Benign proliferative changes are associated with better survival in age- and year-adjusted models (hazard ratio 0.91, 95% CI 0.86-0.97), but this association is attenuated after adjustment for stage and subtype. Conclusions NLP-enabled extraction of background breast features from pathology text is feasible at scale. These features reflect tumor biology but do not independently add prognostic information beyond established clinical variables.

11

Impact Of Maternal Education on Perinatal Outcomes in Delta State, Nigeria

Oweibia, M.; Timighe, G. C.; Agbedi, E. B.

2026-05-01 epidemiology 10.64898/2026.04.30.26352195 medRxiv

Top 0.2%

1.4%

Show abstract

BackgroundPerinatal mortality remains a major public health concern in Nigeria despite global progress in maternal and child health. Maternal education has been identified as a key determinant influencing perinatal outcomes through its effects on health literacy, service utilization, and decision-making. However, limited evidence exists on how maternal education directly impacts perinatal outcomes within the context of Delta State, Nigeria. This study therefore investigated the relationship between maternal education and perinatal outcomes, focusing on perinatal mortality, access to healthcare, and educational interventions that enhance maternal health. MethodsA quantitative cross-sectional study design was employed. Data were collected from 400 mothers who delivered in selected public and private health facilities across six Local Government Areas in Delta State, alongside secondary data on perinatal outcomes obtained from hospital records. A structured questionnaire and record extraction form were used to gather information on maternal education, healthcare access, and perinatal indicators. Data were analyzed using SPSS Version 26, applying descriptive statistics, Pearsons correlation, and regression analysis to determine associations between maternal education and perinatal outcomes. ResultsFindings revealed a strong inverse relationship between maternal education and perinatal mortality (r = -0.431, p < 0.01), indicating that mothers with higher education levels experienced fewer stillbirths and neonatal deaths. Similarly, maternal education was significantly associated with reduced low birth weight incidence (r = -0.362, p < 0.01) and improved neonatal survival (r = 0.415, p < 0.01). Regression results showed that maternal education accounted for 23.9% of the variance in perinatal outcomes (R2 = 0.239, p < 0.001). Women with tertiary education were more likely to attend antenatal care (94%), deliver in health facilities (91%), and receive postnatal care (89%) compared to those without formal education. ConclusionThe study concludes that maternal education plays a decisive role in improving perinatal outcomes in Delta State by promoting healthcare utilization, enhancing health literacy, and reducing preventable perinatal deaths. Strengthening womens education through formal schooling and community-based literacy programs is vital for achieving equitable maternal and neonatal health outcomes. The study recommends multisectorial collaboration between education and health authorities to integrate maternal health education into national curricula and community outreach initiatives as part of efforts to attain Sustainable Development Goals 3 and 4.

12

Development and assessment of tailored illustrations to enhance community understandings of genetics topics

Arner, A. M.; McCabe, T. C.; Seyler, A.; Zamri, S. N.; A/P Tan Boon Huat, T. B. T.; Tam, K. L.; Kinyua, P.; John, E.; Ngoci Njeru, S.; Lim, Y. A.; Gurven, M.; Nicholas, C.; Ayroles, J.; Venkataraman, V. v.; Kraft, T. S.; Wallace, I. J.; Lea, A. J.

2026-03-19 scientific communication and education 10.64898/2026.03.17.711941 medRxiv

Top 0.2%

1.4%

Show abstract

ObjectivesEffective communication about genetics concepts is essential for collaborative anthropological genetics research. However, communication can be challenging because many ideas are abstract and may be especially unfamiliar to communities with limited access to formal education. Indeed, there are no widely adopted models for communicating such information, nor a clear understanding of the social factors that may shape participant engagement. Here, we conducted a qualitative and quantitative, community-driven study to understand how illustrations can be useful to support concept sharing with two Indigenous groups--the Orang Asli of Peninsular Malaysia and the Turkana of Kenya. MethodsWe used a two phase approach to create and evaluate how illustrations can bolster communication about genetics concepts. First, we created images illustrating answers to frequently asked questions about genetics, iteratively updating the illustrations based on participant feedback. Second, we conducted 92 interviews to evaluate the finalized illustrations effectiveness. Finally, we analyzed the interview data using thematic analyses, multivariable modeling, and multiple correspondence analyses to identify patterns in participant understanding and feedback, including age, sex, market integration, and schooling. ResultsParticipants reported high interest in genetics research (92%) and broadly positive perceptions of the illustrations. Familiar, locally-grounded imagery was preferred and associated with greater perceived clarity, while more technical illustrations were more frequently reported as confusing. Quantitative analyses showed strong internal consistency across measures of engagement and understanding, with modest variation by degree of market-integration, schooling, and sex. DiscussionOur findings demonstrate that community-specific visualizations, co-developed through iterative feedback, can effectively support engagement with genetics research in participant communities.

13

Precision Imaging to Evaluate Kaposi Sarcoma (PRIME-KS): protocol for a multicountry novel artificial intelligence-based imaging device

Odeny, T. A.; Adhiambo, H. F.; Mangale, D.; Makanga, P. K.; Odeny, B.; Okuku, F.; Zhou, C.; Geng, E.; Carson, J.; Mudhune, V.; Bukusi, E.; Semeere, A.

2026-06-04 oncology 10.64898/2026.06.03.26354815 medRxiv

Top 0.2%

1.4%

Show abstract

Abstract Background: Kaposi sarcoma (KS) is the most common cancer among men in several Eastern African countries, yet treatment monitoring relies on imprecise, time-consuming ruler-based measurements defined by the AIDS Clinical Trial Group (ACTG). This method suffers from inter-observer variability, fails to capture lesion height or true geometric area, and performs poorly on dark skin. SkinScan3D (SS3D) is a portable, low-cost, AI-enabled 3D imaging device that provides objective measurements of KS skin lesion area, height, volume, and color. The Precision Imaging to Evaluate Kaposi Sarcoma (PRIME-KS) study evaluates whether SS3D provides more reproducible and accurate lesion measurements than the standard method, and validates its integration into routine clinical workflows in Kenya and Uganda. Methods: PRIME-KS is a multicountry prospective mixed-methods study with two clinical objectives. Objective 1 is a cross-sectional diagnostic accuracy study comparing SS3D with ruler-based measurement in 50 adults with KS (150 lesions) across sites in Kenya and Uganda. Two clinicians independently measure three lesions per participant using both methods. The primary outcomes are concordance correlation coefficient (CCC) for inter-rater reproducibility, and co-efficient of determination for accuracy. Objective 2 is a non-randomized before-and-after pilot study in 100 patients at three sites, evaluating device usability, acceptability, appropriateness, and feasibility using validated instruments, along with time-and-motion studies and activity-based micro-costing. Prior to these clinical objectives, a formative study used focus group discussions, discrete choice experiments, and human-centered design workshops to refine the SS3D device and protocols with end-user input. Discussion: PRIME-KS will provide the first rigorous evaluation of a 3D imaging device for monitoring KS treatment response in routine clinical settings. If SS3D demonstrates superior reproducibility and clinical utility, it could reduce unnecessary chemotherapy exposure and associated toxicities by enabling earlier, more objective assessment of treatment response. Trial registration: ClinicalTrials.gov NCT06898203, registered 27 March 2025. Pan African Clinical Trials Registry PACTR202603523439856. Keywords Kaposi sarcoma, SkinScan3D, 3D imaging, treatment monitoring, diagnostic accuracy, implementation science, usability, human-centered design, Kenya, Uganda

14

Machine-Assisted Topic Analysis of Large-Scale Health Experience Data: Identifying Sociodemographic Differences and Evaluating Bias in Large Language Models

Bondaronek, P.; Ward, E.; Beecham, E.; Zhang, E.; Huang, Y.; Ive, J.; Naughton, F.; Wu, H.; Vindrola-Padros, C.

2026-05-22 public and global health 10.64898/2026.05.20.26353755 medRxiv

Top 0.2%

1.3%

Show abstract

Introduction: Large-scale free-text data with socio-demographic information can capture nuanced accounts of lived experience that are difficult to detect in structured measures. However, manual qualitative analysis is difficult to scale, while automated approaches may obscure subgroup variation or introduce bias. This is especially relevant for large language models (LLMs), whose use in qualitative health research is increasing despite limited evaluation in socio-demographically stratified analysis. Objectives: This study examined how socio-demographic differences in health and wellbeing experiences were manifested in a large-scale free-text dataset, and evaluated how different AI-assisted analytic approaches identified these differences. Specifically, it aimed to: (1) identify socio-demographic differences using Machine-Assisted Topic Analysis (MATA); (2) compare MATA outputs with topic modelling combined with LLM-based topic interpretation; and (3) examine potential bias in LLM-based analysis. Methods: We analysed 2,177 valid free-text responses from the UK COVID-19 Wellbeing Tracker, a longitudinal survey of adults recruited during the pandemic. Responses described factors influencing health behaviours, mood, and wellbeing over time. Data were preprocessed and stratified by gender, age, and socioeconomic status (SES). MATA combined topic modelling, using Latent Dirichlet Allocation, with humanled qualitative interpretation of topic keywords and representative responses. The same topic model outputs were then interpreted using an LLM for comparison. Potential LLM bias was assessed using a demographic label-swap crossover design, with bias evaluated through Jaccard lexical similarity, VADER sentiment, and NRC emotion analysis. Grounded Review and Assessment of Computational Evidence (GRACE) was used to evaluate the AI outputs. Powered by Editorial Manager(R) and ProduXion Manager(R) from Aries Systems Corporation Results: MATA identified meaningful socio-demographic thematic differences in pandemic-related mood and wellbeing across gender, age, and SES. Common themes included disruption, adaptation, uncertainty, routine, and the influence of work, relationships, and health on wellbeing. Male-stratified topics emphasised routines, habits, and coping with external pressures, whereas female-stratified topics were more relational and reflective, focusing on connection, isolation, family wellbeing, and anxiety. Lower SES narratives included practical strain, financial pressure, and loss of control, while higher SES narratives more often reflected adjustment, autonomy, and meaning-making. Older adults described health, gratitude, and family connection, whereas younger adults emphasised work-related stress and competing demands. LLM-based interpretation broadly reproduced the high-level subgroup patterns identified through MATA, but outputs were more generalised, less conceptually differentiated, and showed greater thematic overlap. Bias analysis showed systematic shifts in vocabulary, sentiment, and emotional tone when demographic labels were swapped, suggesting a risk of representational bias. Conclusions: MATA identified meaningful socio-demographic differences while retaining interpretative depth at scale. LLM-based topic interpretation showed utility for rapid thematic summarisation, but produced less conceptually differentiated outputs and was sensitive to demographic framing. The analysis also identified "LLM speak", where outputs appeared coherent but relied on abstract, generalised, and overlapping interpretations. Human oversight, structured qualitative appraisal, and explicit bias evaluation are necessary when using LLMs to analyse socially stratified free-text health data.

15

A Global Health Quality Improvement Project: Enhancing Cervical Cancer Awareness and screening in Nigeria

Umar, I. A.; Shehu, N.; Nagib, N.; Sulley, S.; Idris-Saeed, Z. O.

2026-06-11 public and global health 10.64898/2026.06.09.26355311 medRxiv

Top 0.2%

1.3%

Show abstract

Background Cervical cancer remains a significant global public health challenge, ranking as the fourth most common cancer among women worldwide. According to The World Health Organization (WHO) 604,000 women were diagnosed with cervical cancer globally in 2020, with over 342,000 deaths amongst this group [1]. Despite its high mortality, cervical cancer is largely preventable through early detection and vaccination against human papillomavirus (HPV), which causes nearly all cases of cervical cancer [1,2] In Nigeria, it is the second most common cancer among women in Nigeria and a leading cause of cancer-related deaths, with low screening rates exacerbating late diagnoses and poor outcomes [1]. Despite global commitments to elimination with Pap smear screening and HPV vaccination, less than 10% of women in Nigeria have undergone screening due to misconceptions, stigma, and limited awareness. Educational interventions may improve awareness and promote screening behaviors. This global health quality improvement (QI) project aimed to enhance cervical cancer awareness and increase Pap smear uptake at the Central Bank of Nigeria (CBN) Clinic in Abuja, Nigeria. Methods In November 2024, we conducted a health education intervention at the Central Bank of Nigeria (CBN) through a structured educational session for male and female CBN staff members. The session focused on cervical cancer prevention, risk factors, and screening guidelines. Additionally, cervical cancer awareness was raised via email, social media, and electronic bulletin board. Participants completed pre and post-interventions surveys assessing cervical cancer knowledge across 10 key items and demographic characteristics. Pap smear uptake was assessed using the CBN clinic records for three months before and after the intervention. Institutional approval was obtained from CBN and external institutional review board approval was not required. Results 188 participants attended the health education session with 124 survey responses (70 pre-event, 54 post-event). Participants were mostly women aged 30-39. Post-intervention, eight of ten survey questions showed improved knowledge, with five demonstrating statistically significant gains: understanding Pap smear frequency (p<.001), HPV infection prevention (p=.042), early symptoms of cervical cancer (p=.019), smoking as a risk factor (p=.002), and availability of Pap smears at the CBN clinic (p=.035). Pap smear uptake increased from 5 screenings in three months pre-intervention to 32 screenings in the three months post-intervention. Participants reported that the sessions provided a safe space to ask questions and address cultural myths and misconceptions. Conclusion This QI initiative demonstrates the positive impact of targeted health education in improving awareness and screening uptake. Recommendations include increasing awareness through public health talks, updating clinicians on current guidelines, and removing unnecessary barriers to HPV vaccination. These findings align with global health efforts to reduce cervical cancer mortality and underscore the potential of QI projects to improve health outcomes in resource-limited settings.

16

Discordance in pleural mesothelioma response classification and modelling of impact on clinical trials

Cowell, G. W.; Roche, J.; Noble, C.; Stobo, D. B.; Papanastasiou, A.; Kidd, A. C.; Tsim, S.; Blyth, K. G.

2026-03-20 oncology 10.64898/2026.03.18.26348731 medRxiv

Top 0.2%

1.3%

Show abstract

Introduction Agreement between radiologists regarding treatment response in Pleural Mesothelioma (PM) is acknowledged to be poor, but downstream effects in clinical trials have not been quantified. Methods We performed a mixed methods study, composed of a multicentre, retrospective cohort study and in silico modelling. CT images and data were retrieved from 4 UK centres regarding chemotherapy-treated patients. Expert radiologists classified response using modified Response Evaluation Criteria In Solid Tumours criteria (mRECIST) v1.1, generating discordance rate (%) and agreement. In silico modelling simulated two-arm trials of an active therapy with intended 80% power and confidence intervals for four endpoints (objective response rate (ORR), disease control rate (DCR), progression-free survival (PFS), overall survival (OS)) covering 95% of the true effect. Actual power and endpoint coverage were modelled against mRECIST misclassification rate (a single reporter equivalent of discordance rate). Consecutive simulations varied misclassification rate from 0-100% in 1% increments, each repeated 10,000 times. Results 172 cases were included. Discordance rate was 35% (60/172), kappa=0.456. In silico modelling demonstrated reduced power and endpoint precision with increasing misclassification. At 17% misclassification, corresponding to the observed 35% discordance, power dropped from 80% to 55% for ORR, 53% for DCR, 65% for PFS and 66% for OS, with endpoint coverage reduced to 88%, 89%, 92% and 92%, respectively. 50/60 (83%) discordances reflected interpretation or measurement differences intrinsic to mRECIST. Discordance was not associated with tumour volume. Conclusions Inconsistent response classification is common in PM and substantially reduces statistical power and endpoint precision in clinical trials.

17

Predicting Depressive Symptoms Among Reproductive-Aged Women in Bangladesh Using Bagging Ensemble Machine Learning on Imbalanced Bangladesh Demographic and Health Survey 2022 Data

Mahmud, S.; Akter, M. S.; Ahamed, B.; Rahman, A. E.; El Arifeen, S.; Hossain, A. T.

2026-04-23 public and global health 10.64898/2026.04.22.26351445 medRxiv

Top 0.2%

1.3%

Show abstract

BackgroundDepressive symptoms among reproductive-aged women represent a major public health concern in low- and middle-income countries, yet systematic screening remains limited. In most population survey datasets, the low prevalence of depression results in severe class imbalance, which challenges conventional machine learning models. Therefore, we develop and evaluate a bagging-based ensemble machine learning framework to predict depressive symptoms among reproductive-aged women using highly imbalanced Bangladesh demographic and health survey (BDHS) 2022 data. MethodsThe sample comprised women aged 15-49 years drawn from BDHS 2022 data. Depressive symptoms were defined using the Patient Health Questionnaire (PHQ-9 [≥]10). Candidate predictors were drawn from sociodemographic, reproductive, nutritional, psychosocial, healthcare access, and environmental domains. Feature selection was performed using Elastic Net (EN), Random Forest (RF), and XGBoost model. Five classifiers (EN, RF, Support Vector Machine (SVM), K-nearest neighbors (KNN), and Gradient Boosting Machine (GBM)) were trained using both oversampling-based approaches and the proposed ensemble framework. Model performance was evaluated on an independent test set using accuracy, sensitivity, specificity, F1-score, and the normalized Matthews correlation coefficient (normMCC). ResultsApproximately 4.8% of women were identified with depressive symptoms. The proposed bagging ensemble framework consistently achieved more balanced predictive performance than oversampling-based models. Average normMCC improved from 0.540 (oversampling) to 0.557 (ensemble). RF and GBM ensembles demonstrated notable improvements in identifying depressive cases, while the EN ensemble achieved the highest overall performance and sensitivity. Threshold optimization yielded stable normMCC across models, indicating robust trade-offs between sensitivity and specificity. ConclusionsBagging-based ensemble learning provides a more robust and balanced approach than synthetic oversampling for predicting depressive symptoms in highly imbalanced population survey data. This approach has important implications for improving early identification and population-level mental health surveillance in resource-constrained settings.

18

Cancer Prevalence and Patterns in Kilifi County: A 10-year Retrospective Descriptive Study

Masha, M.; Mbugua, R. W.; Abdullahi, M.; Sheikh, N. A.; Omar, A.; Abdihamid, O.

2026-06-01 oncology 10.64898/2026.05.20.26353643 medRxiv

Top 0.2%

1.3%

Show abstract

Abstract Background Cancer is an increasing public health challenge in Kenya, particularly in rural and underserved regions where surveillance systems and diagnostic capacity remain limited. Kilifi County, located along the Kenyan coast, lacks a population-based cancer registry, and data on the local cancer burden is not available. This study aimed to characterize the demographic distribution of patients, cancer burden in the county, and management of cancer cases diagnosed at Kilifi County Referral Hospital (KCRH) over ten years. Methods This retrospective study analyzed the patterns of cancer in Kilifi County using patient records from KCRH during the study period (January 1, 2014, to January 1, 2024). Results A total of 101 patients with cancer were identified, 58% female, with a mean age of 54 years. Most patients were from Kilifi North (47%), with a high proportion reporting no formal occupation (41%) or farming (26%). Esophageal and cervical cancers were the most common (18% each), followed by breast and prostate cancers (5% each), with other malignancies occurring infrequently. Histopathology was the primary diagnostic modality (88%). Staging data were incomplete in 70% of cases; among documented cases, the majority presented with advanced disease (21% stage IV). Due to limited local treatment capacity, approximately half of the patients were referred to tertiary centers for chemotherapy, radiotherapy, or surgery. At data cut-off, 43% had died, 25% were on treatment, and 29% were lost to follow-up, with only 2% completing treatment or under follow-up. Conclusions This study demonstrates a substantial cancer burden in Kilifi County and highlights critical gaps in diagnostic capacity, staging, and continuity of care. Strengthening cancer surveillance systems, expanding diagnostic and treatment infrastructure, and establishing a population-based cancer registry are essential to improving cancer outcomes and advancing equitable care in rural Kenya

19

On the road to early detection: A survey study of barriers and facilitators to community participation in a mobile lung cancer screening program

Cottrell-Daniels, C.; Sadig, N.; Haddan, S.; Roman, S.; Simmons, V. N.; Schabath, M. B.

2026-04-17 epidemiology 10.64898/2026.04.15.26350954 medRxiv

Top 0.2%

1.3%

Show abstract

BackgroundWhile a mobile lung cancer screening (mLCS) program can mitigate barriers to access, this study conducted a survey study to assess barriers and facilitators to mLCS which could inform the implementation of new mLCS programs or inform modifications to existing programs. MethodsPatient eligibility included current age of 50 to 80 and had undergone any cancer screening at Moffitt Cancer Center (MCC) between January 1, 2023 and December 1, 2024. A web-based survey was administered from May 2025 to June 2025 which collected data on health behaviors, barriers, facilitators, screening preferences, and demographics. Descriptive statistics were used to quantify survey responses. ResultsAmong participants who completed the survey, 73.4% reported no concerns about getting screened in a mobile screening unit, 67.9% reported concerned about the cost or if insurance covered mobile lung cancer screening, and 82.4% reported they would be screened if a voucher or insurance would pay for it. For preferences, 54.1% reported no preference for the time of year for a mobile screening event, 59.6% reported they will be willing to wait up to 30 minutes to get screened, and 44% would travel more than 20 minutes to get screened. There were no statistically significant differences in barriers and facilitators when the analyses were stratified by LCS eligibility. ConclusionsWe found acceptability of mobile lung cancer screening and preferences that are actionable including daytime weekday events, indoor waiting, short waits, proximity to home, clear cost coverage, and streamlined clinician recommendation.

20

Ethnobotanical survey of plant mosquito repellents: Knowledge, utilization, and application methods for malaria prevention in the Rwenzori Region, Western Uganda

Mugisa, T.; Kimera, E.; Ikiriza, A.; Kakongi, N.; Meble, K.; Andinda, M.; Idehen, C.; Anyanwu, C.; Ungokore, H. Y.; Igwe, M. C.

2026-05-07 scientific communication and education 10.64898/2026.05.04.722777 medRxiv

Top 0.2%

1.2%

Show abstract

BackgroundMalaria remains a major public health challenge in Uganda, particularly in rural areas where access to conventional vector control tools is limited. Communities often use locally available plants as mosquito repellents, but documentation of the specific plants used, their utilization levels, and application methods in the Rwenzori region are limited. This study aimed to identify the types of plants used locally to repel mosquitoes, assess the level of utilization of plant-based mosquito repellents, and determine the methods of application employed by communities. MethodsA community-based cross-sectional study was conducted from June to December 2024 in the seven districts and one city of the Rwenzori region, Western Uganda. Multi-stage sampling was used to select 173 household heads. Data were collected using a pre-tested, translated (Runyoro, Rutooro, Lukonzo) KoboCollect questionnaire and analyzed descriptively with SPSS version 23. ResultsEighty-six percent of respondents reported using plant-based mosquito repellents, with 55% relying exclusively on plants. The most used plants were Cymbopogon citratus (citronella/lemon grass, 39.9%), Rosmarinus officinalis (rosemary, 25.7%), and Eucalyptus spp. (24.3%). The primary application method was planting repellent plants around the house (51.4%), followed by hanging injured plant parts in windows and doorways (28.4%). Other methods included burning or crushing plant parts and applying extracts/oils. ConclusionPlant-based mosquito repellents are widely used in the Rwenzori region. This study documents community knowledge and practices that could inform future integrated vector management strategies. Further research is needed to evaluate the entomological and epidemiological effectiveness of the plant repellents that are most used plants and the methods commonly applied.